[libvirt] memory pinning problem

Hi, we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory Libvirt version 1.1.1 Linux 3.11-rc7 Because I coudn't find any other service which allowed a 7M file upload, I put the log file and everything else which could perhabs be relevant into a github repository: https://github.com/David-Weber/vcpu-pinning When we try to start a guest on the first node it runs fine: <vcpu placement='static' cpuset='0-3,8-11'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune> Starting it on the second node fails <vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> Even more strange, starting it with the CPUs of the second node and the memory of the first node works: <vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune> The log file contains these three cases. Using the placement='auto' parameter leads to the same problem. If numad return the second node, the guest won't start. Is this a configuration, a libvirt or a cgroup problem? :) Cheers, David

On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Libvirt version 1.1.1 Linux 3.11-rc7
Because I coudn't find any other service which allowed a 7M file upload, I put the log file and everything else which could perhabs be relevant into a github repository: https://github.com/David-Weber/vcpu-pinning
When we try to start a guest on the first node it runs fine: <vcpu placement='static' cpuset='0-3,8-11'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune>
Starting it on the second node fails <vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune>
Even more strange, starting it with the CPUs of the second node and the memory of the first node works: <vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune>
The log file contains these three cases.
Using the placement='auto' parameter leads to the same problem. If numad return the second node, the guest won't start.
Is this a configuration, a libvirt or a cgroup problem? :)
With mode='strict' you are telling QEMU that if it can't allocate memory from the requested node, it should fail. Is it possible that some of your Numa nodes have insufficient memory free ? The combination of 'virsh capabilities' output and the results of 'virsh freecell NODENUM' for each NUMA node will give an indication of the allocation state. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Libvirt version 1.1.1 Linux 3.11-rc7
Because I coudn't find any other service which allowed a 7M file upload, I put the log file and everything else which could perhabs be relevant into a github repository: https://github.com/David-Weber/vcpu-pinning
When we try to start a guest on the first node it runs fine: <vcpu placement='static' cpuset='0-3,8-11'>4</vcpu> <numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
Starting it on the second node fails
<vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune>
<memory mode='strict' nodeset='1'/>
</numatune>
Even more strange, starting it with the CPUs of the second node and the memory> of the first node works: <vcpu placement='static' cpuset='4-7,12-15'>4</vcpu> <numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
The log file contains these three cases.
Using the placement='auto' parameter leads to the same problem. If numad return the second node, the guest won't start.
Is this a configuration, a libvirt or a cgroup problem? :)
With mode='strict' you are telling QEMU that if it can't allocate memory from the requested node, it should fail. Is it possible that some of your Numa nodes have insufficient memory free ?
The combination of 'virsh capabilities' output and the results of 'virsh freecell NODENUM' for each NUMA node will give an indication of the allocation state.
Daniel
Thank you for your response. There should be plenty of free memory be available. # virsh capabilities <capabilities> <host> <uuid>91ecf0d9-c821-c9ca-c7fa-00259064e5c6</uuid> <cpu> <arch>x86_64</arch> <model>Westmere</model> <vendor>Intel</vendor> <topology sockets='1' cores='4' threads='2'/> <feature name='rdtscp'/> <feature name='pdpe1gb'/> <feature name='dca'/> <feature name='pcid'/> <feature name='pdcm'/> <feature name='xtpr'/> <feature name='tm2'/> <feature name='est'/> <feature name='smx'/> <feature name='vmx'/> <feature name='ds_cpl'/> <feature name='monitor'/> <feature name='dtes64'/> <feature name='pclmuldq'/> <feature name='pbe'/> <feature name='tm'/> <feature name='ht'/> <feature name='ss'/> <feature name='acpi'/> <feature name='ds'/> <feature name='vme'/> </cpu> <power_management> <suspend_mem/> <suspend_disk/> <suspend_hybrid/> </power_management> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features> <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>99003008</memory> <cpus num='8'> <cpu id='0' socket_id='0' core_id='0' siblings='0,8'/> <cpu id='1' socket_id='0' core_id='1' siblings='1,9'/> <cpu id='2' socket_id='0' core_id='9' siblings='2,10'/> <cpu id='3' socket_id='0' core_id='10' siblings='3,11'/> <cpu id='8' socket_id='0' core_id='0' siblings='0,8'/> <cpu id='9' socket_id='0' core_id='1' siblings='1,9'/> <cpu id='10' socket_id='0' core_id='9' siblings='2,10'/> <cpu id='11' socket_id='0' core_id='10' siblings='3,11'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>99088544</memory> <cpus num='8'> <cpu id='4' socket_id='1' core_id='0' siblings='4,12'/> <cpu id='5' socket_id='1' core_id='1' siblings='5,13'/> <cpu id='6' socket_id='1' core_id='9' siblings='6,14'/> <cpu id='7' socket_id='1' core_id='10' siblings='7,15'/> <cpu id='12' socket_id='1' core_id='0' siblings='4,12'/> <cpu id='13' socket_id='1' core_id='1' siblings='5,13'/> <cpu id='14' socket_id='1' core_id='9' siblings='6,14'/> <cpu id='15' socket_id='1' core_id='10' siblings='7,15'/> </cpus> </cell> </cells> </topology> <secmodel> <model>none</model> <doi>0</doi> </secmodel> <secmodel> <model>dac</model> <doi>0</doi> </secmodel> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu-system-x86_64</emulator> <machine canonical='pc-i440fx-1.5' maxCpus='255'>pc</machine> <machine maxCpus='255'>pc-q35-1.4</machine> <machine canonical='pc-q35-1.5' maxCpus='255'>q35</machine> <machine maxCpus='1'>isapc</machine> <machine maxCpus='255'>pc-0.10</machine> <machine maxCpus='255'>pc-0.11</machine> <machine maxCpus='255'>pc-0.12</machine> <machine maxCpus='255'>pc-0.13</machine> <machine maxCpus='255'>pc-0.14</machine> <machine maxCpus='255'>pc-0.15</machine> <machine maxCpus='255'>pc-1.0</machine> <machine maxCpus='255'>pc-1.1</machine> <machine maxCpus='255'>pc-1.2</machine> <machine maxCpus='255'>pc-1.3</machine> <machine maxCpus='255'>pc-i440fx-1.4</machine> <machine maxCpus='1'>none</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/bin/qemu-kvm</emulator> <machine canonical='pc-i440fx-1.5' maxCpus='255'>pc</machine> <machine maxCpus='255'>pc-q35-1.4</machine> <machine canonical='pc-q35-1.5' maxCpus='255'>q35</machine> <machine maxCpus='1'>isapc</machine> <machine maxCpus='255'>pc-0.10</machine> <machine maxCpus='255'>pc-0.11</machine> <machine maxCpus='255'>pc-0.12</machine> <machine maxCpus='255'>pc-0.13</machine> <machine maxCpus='255'>pc-0.14</machine> <machine maxCpus='255'>pc-0.15</machine> <machine maxCpus='255'>pc-1.0</machine> <machine maxCpus='255'>pc-1.1</machine> <machine maxCpus='255'>pc-1.2</machine> <machine maxCpus='255'>pc-1.3</machine> <machine maxCpus='255'>pc-i440fx-1.4</machine> <machine maxCpus='1'>none</machine> </domain> </arch> <features> <cpuselection/> <deviceboot/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> <pae/> <nonpae/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/bin/qemu-system-x86_64</emulator> <machine canonical='pc-i440fx-1.5' maxCpus='255'>pc</machine> <machine maxCpus='255'>pc-q35-1.4</machine> <machine canonical='pc-q35-1.5' maxCpus='255'>q35</machine> <machine maxCpus='1'>isapc</machine> <machine maxCpus='255'>pc-0.10</machine> <machine maxCpus='255'>pc-0.11</machine> <machine maxCpus='255'>pc-0.12</machine> <machine maxCpus='255'>pc-0.13</machine> <machine maxCpus='255'>pc-0.14</machine> <machine maxCpus='255'>pc-0.15</machine> <machine maxCpus='255'>pc-1.0</machine> <machine maxCpus='255'>pc-1.1</machine> <machine maxCpus='255'>pc-1.2</machine> <machine maxCpus='255'>pc-1.3</machine> <machine maxCpus='255'>pc-i440fx-1.4</machine> <machine maxCpus='1'>none</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/bin/qemu-kvm</emulator> <machine canonical='pc-i440fx-1.5' maxCpus='255'>pc</machine> <machine maxCpus='255'>pc-q35-1.4</machine> <machine canonical='pc-q35-1.5' maxCpus='255'>q35</machine> <machine maxCpus='1'>isapc</machine> <machine maxCpus='255'>pc-0.10</machine> <machine maxCpus='255'>pc-0.11</machine> <machine maxCpus='255'>pc-0.12</machine> <machine maxCpus='255'>pc-0.13</machine> <machine maxCpus='255'>pc-0.14</machine> <machine maxCpus='255'>pc-0.15</machine> <machine maxCpus='255'>pc-1.0</machine> <machine maxCpus='255'>pc-1.1</machine> <machine maxCpus='255'>pc-1.2</machine> <machine maxCpus='255'>pc-1.3</machine> <machine maxCpus='255'>pc-i440fx-1.4</machine> <machine maxCpus='1'>none</machine> </domain> </arch> <features> <cpuselection/> <deviceboot/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> </capabilities> # virsh freecell 0 0: 86071624 KiB # virsh freecell 1 1: 75258628 KiB # virsh edit test <domain type='kvm'> <name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> <os> <type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memballoon> </devices> </domain> # virsh start test error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s David

On Wed, Sep 11, 2013 at 10:47:08AM +0200, David Weber wrote:
Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
# virsh freecell 0 0: 86071624 KiB
# virsh freecell 1 1: 75258628 KiB
# virsh edit test <domain type='kvm'> <name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> <os> <type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memballoon> </devices> </domain>
# virsh start test error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s
Hmm, this makes no sense at all to me. Your configuration looks totally valid and you have plenty of memory in both nodes. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Mittwoch, 11. September 2013, 11:27:30 schrieb Daniel P. Berrange:
On Wed, Sep 11, 2013 at 10:47:08AM +0200, David Weber wrote:
Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
# virsh freecell 0 0: 86071624 KiB
# virsh freecell 1 1: 75258628 KiB
# virsh edit test <domain type='kvm'>
<name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune>
<memory mode='strict' nodeset='1'/>
</numatune> <os>
<type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/>
</os> <features>
<acpi/> <apic/> <pae/>
</features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices>
<emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
</controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video>
<model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video> <memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</memballoon>
</devices>
</domain>
# virsh start test
error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s
Hmm, this makes no sense at all to me. Your configuration looks totally valid and you have plenty of memory in both nodes.
It also fails if try to assign it to another node through the cgroup interface. So it's probably a kernel and not a libvirt issue: # /bin/echo 1 > /sys/fs/cgroup/cpuset/machine/Ubuntu.libvirt-qemu/cpuset.mems /bin/echo: write error: Device or resource busy It has also been reported here: https://bugzilla.redhat.com/show_bug.cgi?id=920406 David
Daniel

Am Mittwoch, 11. September 2013, 11:27:30 schrieb Daniel P. Berrange:
On Wed, Sep 11, 2013 at 10:47:08AM +0200, David Weber wrote:
Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
# virsh freecell 0 0: 86071624 KiB
# virsh freecell 1 1: 75258628 KiB
# virsh edit test <domain type='kvm'>
<name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune>
<memory mode='strict' nodeset='1'/>
</numatune> <os>
<type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/>
</os> <features>
<acpi/> <apic/> <pae/>
</features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices>
<emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
</controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video>
<model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video> <memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</memballoon>
</devices>
</domain>
# virsh start test
error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s
Hmm, this makes no sense at all to me. Your configuration looks totally valid and you have plenty of memory in both nodes.
After reading a bit more about cgroups, I now think I know whats going on. Lets assume we have a 2 node dualcore system and start a guest named 'test' without cpu or memory pinning. * libvirt creates a controller under cpuset/machine/test.libvirt-qemu: cpuset/machine/test.libvirt-qemu/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/cpuset.mems -> 0-1 * libvirt creates a controller for every vcpu: cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems -> 0-1 * libvirt creates a controller for qemu: cpuset/machine/test.libvirt-qemu/emulator/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems -> 0-1 Now we want to pin the guest to the second node virsh # numatune test --nodeset 1 error: Unable to change numa parameters error: Unable to write to '/sys/fs/cgroup/cpuset/machine/Ubuntu.libvirt- qemu/cpuset.mems': Device or resource busy What happens is that Libvirt tries to set cpuset/machine/test.libvirt- qemu/cpuset.mems to 1 but this is not possible because cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems and cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems still contain 0-1. Libvirt has to change these values before! If I do this manually, numatune runs fine: # echo 1 > vcpu0/cpuset.mems # echo 1 > emulator/cpuset.mems # virsh numatune test --nodeset 1 Cheers, David
Daniel

On Thu, Sep 19, 2013 at 01:26:52PM +0200, David Weber wrote:
Am Mittwoch, 11. September 2013, 11:27:30 schrieb Daniel P. Berrange:
On Wed, Sep 11, 2013 at 10:47:08AM +0200, David Weber wrote:
Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
# virsh freecell 0 0: 86071624 KiB
# virsh freecell 1 1: 75258628 KiB
# virsh edit test <domain type='kvm'>
<name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune>
<memory mode='strict' nodeset='1'/>
</numatune> <os>
<type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/>
</os> <features>
<acpi/> <apic/> <pae/>
</features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices>
<emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
</controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video>
<model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video> <memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</memballoon>
</devices>
</domain>
# virsh start test
error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s
Hmm, this makes no sense at all to me. Your configuration looks totally valid and you have plenty of memory in both nodes.
After reading a bit more about cgroups, I now think I know whats going on.
Lets assume we have a 2 node dualcore system and start a guest named 'test' without cpu or memory pinning.
* libvirt creates a controller under cpuset/machine/test.libvirt-qemu: cpuset/machine/test.libvirt-qemu/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/cpuset.mems -> 0-1 * libvirt creates a controller for every vcpu: cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems -> 0-1 * libvirt creates a controller for qemu: cpuset/machine/test.libvirt-qemu/emulator/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems -> 0-1
Now we want to pin the guest to the second node virsh # numatune test --nodeset 1 error: Unable to change numa parameters error: Unable to write to '/sys/fs/cgroup/cpuset/machine/Ubuntu.libvirt- qemu/cpuset.mems': Device or resource busy
What happens is that Libvirt tries to set cpuset/machine/test.libvirt- qemu/cpuset.mems to 1 but this is not possible because cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems and cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems still contain 0-1. Libvirt has to change these values before!
Oooh, interesting hypothesis. I wonder if this is a kernel behaviour change. I'm fairly sure that in the past if you removed a cpu from the cpuset mask, it would automagicaly purge it from all children. Please file a bug about this - it should be possible to make libvirt do the right thing and purge child masks explicitly first. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Donnerstag, 19. September 2013, 12:33:21 schrieb Daniel P. Berrange:
On Thu, Sep 19, 2013 at 01:26:52PM +0200, David Weber wrote:
Am Mittwoch, 11. September 2013, 11:27:30 schrieb Daniel P. Berrange:
On Wed, Sep 11, 2013 at 10:47:08AM +0200, David Weber wrote:
Am Freitag, 6. September 2013, 12:10:04 schrieb Daniel P. Berrange:
On Tue, Aug 27, 2013 at 09:09:25AM +0200, David Weber wrote:
Hi,
we try to use vcpu pinning on a 2 socket server with Intel Xeon E5620 cpus, HT enabled and 2*6*16GiB Ram but experience problems if we try to start a guest on the second socket: error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
# virsh freecell 0 0: 86071624 KiB
# virsh freecell 1 1: 75258628 KiB
# virsh edit test <domain type='kvm'>
<name>test</name> <uuid>08cdc389-78bf-450c-89f4-b4728edabdbf</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static' cpuset='4-7'>1</vcpu> <numatune>
<memory mode='strict' nodeset='1'/>
</numatune> <os>
<type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/>
</os> <features>
<acpi/> <apic/> <pae/>
</features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices>
<emulator>/usr/bin/qemu-kvm</emulator> <controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
</controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video>
<model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video> <memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</memballoon>
</devices>
</domain>
# virsh start test
error: Failed to start domain test error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
Allocating memory on this node with numactl works fine # numactl --cpubind=1 --membind=1 -- dd if=/dev/zero of=/dev/null bs=2G count=1 0+1 records in 0+1 records out 2147479552 bytes (2.1 GB) copied, 0.60816 s, 3.5 GB/s
Hmm, this makes no sense at all to me. Your configuration looks totally valid and you have plenty of memory in both nodes.
After reading a bit more about cgroups, I now think I know whats going on.
Lets assume we have a 2 node dualcore system and start a guest named 'test' without cpu or memory pinning.
* libvirt creates a controller under cpuset/machine/test.libvirt-qemu: cpuset/machine/test.libvirt-qemu/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/cpuset.mems -> 0-1 * libvirt creates a controller for every vcpu: cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems -> 0-1 * libvirt creates a controller for qemu: cpuset/machine/test.libvirt-qemu/emulator/cpuset.cpus -> 0-3 cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems -> 0-1
Now we want to pin the guest to the second node virsh # numatune test --nodeset 1 error: Unable to change numa parameters error: Unable to write to '/sys/fs/cgroup/cpuset/machine/Ubuntu.libvirt- qemu/cpuset.mems': Device or resource busy
What happens is that Libvirt tries to set cpuset/machine/test.libvirt- qemu/cpuset.mems to 1 but this is not possible because cpuset/machine/test.libvirt-qemu/vcpu*/cpuset.mems and cpuset/machine/test.libvirt-qemu/emulator/cpuset.mems still contain 0-1. Libvirt has to change these values before!
Oooh, interesting hypothesis. I wonder if this is a kernel behaviour change. I'm fairly sure that in the past if you removed a cpu from the cpuset mask, it would automagicaly purge it from all children.
Please file a bug about this - it should be possible to make libvirt do the right thing and purge child masks explicitly first.
Done: https://bugzilla.redhat.com/show_bug.cgi?id=1009880 I have also tested Linux 3.2.51 so the change would have had to happen quite some time ago. Cheers, David
participants (2)
-
Daniel P. Berrange
-
David Weber