[libvirt] CPU topology 'sockets' handling guest vs host

On my x86_64 host I have a pair of Quad core CPUs, each in a separate NUMA node. The virsh capabilities topology data reports this: # virsh capabilities | xmllint --xpath /capabilities/host/cpu - <cpu> <arch>x86_64</arch> <model>Opteron_G3</model> <vendor>AMD</vendor> <topology sockets="1" cores="4" threads="1"/> <feature name="osvw"/> <feature name="3dnowprefetch"/> <feature name="cr8legacy"/> <feature name="extapic"/> <feature name="cmp_legacy"/> <feature name="3dnow"/> <feature name="3dnowext"/> <feature name="pdpe1gb"/> <feature name="fxsr_opt"/> <feature name="mmxext"/> <feature name="ht"/> <feature name="vme"/> </cpu> # virsh capabilities | xmllint --xpath /capabilities/host/topology - <topology> <cells num="2"> <cell id="0"> <cpus num="4"> <cpu id="0"/> <cpu id="1"/> <cpu id="2"/> <cpu id="3"/> </cpus> </cell> <cell id="1"> <cpus num="4"> <cpu id="4"/> <cpu id="5"/> <cpu id="6"/> <cpu id="7"/> </cpus> </cell> </cells> </topology> Note, it is reporting sockets=1, because sockets is the number of sockets *per* NUMA node. Now I try to figure the guest to match the host using: <cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell cpus='0-3' memory='512000'/> <cell cpus='4-7' memory='512000'/> </numa> </cpu> And I get: error: Maximum CPUs greater than topology limit So, the XML checker is mistaking 'sockets' as the total number of sockets, rather than the per-node socket count. We need to fix this bogus check Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Mar 26, 2012 at 15:42:58 +0100, Daniel P. Berrange wrote:
On my x86_64 host I have a pair of Quad core CPUs, each in a separate NUMA node. The virsh capabilities topology data reports this:
# virsh capabilities | xmllint --xpath /capabilities/host/cpu - <cpu> <arch>x86_64</arch> <model>Opteron_G3</model> <vendor>AMD</vendor> <topology sockets="1" cores="4" threads="1"/> <feature name="osvw"/> <feature name="3dnowprefetch"/> <feature name="cr8legacy"/> <feature name="extapic"/> <feature name="cmp_legacy"/> <feature name="3dnow"/> <feature name="3dnowext"/> <feature name="pdpe1gb"/> <feature name="fxsr_opt"/> <feature name="mmxext"/> <feature name="ht"/> <feature name="vme"/> </cpu> # virsh capabilities | xmllint --xpath /capabilities/host/topology - <topology> <cells num="2"> <cell id="0"> <cpus num="4"> <cpu id="0"/> <cpu id="1"/> <cpu id="2"/> <cpu id="3"/> </cpus> </cell> <cell id="1"> <cpus num="4"> <cpu id="4"/> <cpu id="5"/> <cpu id="6"/> <cpu id="7"/> </cpus> </cell> </cells> </topology>
Note, it is reporting sockets=1, because sockets is the number of sockets *per* NUMA node.
Now I try to figure the guest to match the host using:
<cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell cpus='0-3' memory='512000'/> <cell cpus='4-7' memory='512000'/> </numa> </cpu>
And I get:
error: Maximum CPUs greater than topology limit
So, the XML checker is mistaking 'sockets' as the total number of sockets, rather than the per-node socket count. We need to fix this bogus check
I guess what we actually want to do is to report total number of sockets in host cpu xml. Sockets per NUMA node has been proven to be a bad decision and we should not let it infect other areas. Jirka

On Mon, Mar 26, 2012 at 05:08:05PM +0200, Jiri Denemark wrote:
On Mon, Mar 26, 2012 at 15:42:58 +0100, Daniel P. Berrange wrote:
On my x86_64 host I have a pair of Quad core CPUs, each in a separate NUMA node. The virsh capabilities topology data reports this:
# virsh capabilities | xmllint --xpath /capabilities/host/cpu - <cpu> <arch>x86_64</arch> <model>Opteron_G3</model> <vendor>AMD</vendor> <topology sockets="1" cores="4" threads="1"/> <feature name="osvw"/> <feature name="3dnowprefetch"/> <feature name="cr8legacy"/> <feature name="extapic"/> <feature name="cmp_legacy"/> <feature name="3dnow"/> <feature name="3dnowext"/> <feature name="pdpe1gb"/> <feature name="fxsr_opt"/> <feature name="mmxext"/> <feature name="ht"/> <feature name="vme"/> </cpu> # virsh capabilities | xmllint --xpath /capabilities/host/topology - <topology> <cells num="2"> <cell id="0"> <cpus num="4"> <cpu id="0"/> <cpu id="1"/> <cpu id="2"/> <cpu id="3"/> </cpus> </cell> <cell id="1"> <cpus num="4"> <cpu id="4"/> <cpu id="5"/> <cpu id="6"/> <cpu id="7"/> </cpus> </cell> </cells> </topology>
Note, it is reporting sockets=1, because sockets is the number of sockets *per* NUMA node.
Now I try to figure the guest to match the host using:
<cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell cpus='0-3' memory='512000'/> <cell cpus='4-7' memory='512000'/> </numa> </cpu>
And I get:
error: Maximum CPUs greater than topology limit
So, the XML checker is mistaking 'sockets' as the total number of sockets, rather than the per-node socket count. We need to fix this bogus check
I guess what we actually want to do is to report total number of sockets in host cpu xml. Sockets per NUMA node has been proven to be a bad decision and we should not let it infect other areas.
No, we can't change that - we explicitly fixed that a while back because it breaks the VIR_NODEINFO_MAXCPUS macro to do that. commit ac9dd4a676f21b5e3ca6dbe0526f2a6709072beb Author: Jiri Denemark <jdenemar@redhat.com> Date: Wed Nov 24 11:25:19 2010 +0100 Fix host CPU counting on unusual NUMA topologies The nodeinfo structure includes nodes : the number of NUMA cell, 1 for uniform mem access sockets : number of CPU socket per node cores : number of core per socket threads : number of threads per core which does not work well for NUMA topologies where each node does not consist of integral number of CPU sockets. We also have VIR_NODEINFO_MAXCPUS macro in public libvirt.h which computes maximum number of CPUs as (nodes * sockets * cores * threads). As a result, we can't just change sockets to report total number of sockets instead of sockets per node. This would probably be the easiest since I doubt anyone is using the field directly. But because of the macro, some apps might be using sockets indirectly. This patch leaves sockets to be the number of CPU sockets per node (and fixes qemu driver to comply with this) on machines where sockets can be divided by nodes. If we can't divide sockets by nodes, we behave as if there was just one NUMA node containing all sockets. Apps interested in NUMA should consult capabilities XML, which is what they probably do anyway. This way, the only case in which apps that care about NUMA may break is on machines with funky NUMA topology. And there is a chance libvirt wasn't able to start any guests on those machines anyway (although it depends on the topology, total number of CPUs and kernel version). Nothing changes at all for apps that don't care about NUMA. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Mar 26, 2012 at 16:11:08 +0100, Daniel P. Berrange wrote:
On Mon, Mar 26, 2012 at 05:08:05PM +0200, Jiri Denemark wrote:
On Mon, Mar 26, 2012 at 15:42:58 +0100, Daniel P. Berrange wrote:
So, the XML checker is mistaking 'sockets' as the total number of sockets, rather than the per-node socket count. We need to fix this bogus check
I guess what we actually want to do is to report total number of sockets in host cpu xml. Sockets per NUMA node has been proven to be a bad decision and we should not let it infect other areas.
No, we can't change that - we explicitly fixed that a while back because it breaks the VIR_NODEINFO_MAXCPUS macro to do that.
commit ac9dd4a676f21b5e3ca6dbe0526f2a6709072beb Author: Jiri Denemark <jdenemar@redhat.com> Date: Wed Nov 24 11:25:19 2010 +0100
Fix host CPU counting on unusual NUMA topologies
Yes, this is the proof of "sockets per node is a bad thing" I was referring to :-) That design broke on some funky NUMA topologies where NUMA nodes where not composed of sockets (they were rather composed of cores). The ideal fix would have been to provide total number of sockets but we couldn't do it because of VIR_NODEINFO_MAXCPUS macro used nodes * sockets in its computation. So instead, we now pretend (in the nodeinfo structure) there is just 1 NUMA node but for better backward compatibility, we only do so when we can't just divide sockets by nodes. We should have more freedom to fix the XML since (and provide total number of sockets there) since we don't have any macro that would take the XML and compute total number of CPUs from it. We should really try hard to avoid using sockets per node in guest XML since that would unnecessarily limit usable topologies. One way is to make CPU topology completely separate from NUMA, i.e., sockets would mean total number of sockets the guest will see (I deliberately used topology that cannot be represented with sockets per node semantics): <vcpu>12</vcpu> <cpu> <topology sockets='3' cores='4' threads='1'/> <numa> <cell cpus='0-2' memory='512000'/> <cell cpus='3-5' memory='512000'/> <cell cpus='6-8' memory='512000'/> <cell cpus='9-11' memory='512000'/> </numa> </cpu> or, alternatively, make CPU topology describe only a single socket (i.e., sockets must always be 1) and total number of CPUs would only be provided by <vcpu/>: <vcpu>12</vcpu> <cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell cpus='0-2' memory='512000'/> <cell cpus='3-5' memory='512000'/> <cell cpus='6-8' memory='512000'/> <cell cpus='9-11' memory='512000'/> </numa> </cpu> I think the latter (sockets is always 1) is not good either since that would be incompatible with numerous existing guest XMLs and it is also harder to deal with within libvirt. Jirka

On Mon, Mar 26, 2012 at 9:42 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On my x86_64 host I have a pair of Quad core CPUs, each in a separate NUMA node. The virsh capabilities topology data reports this:
# virsh capabilities | xmllint --xpath /capabilities/host/cpu - <cpu> <arch>x86_64</arch> <model>Opteron_G3</model> <vendor>AMD</vendor> <topology sockets="1" cores="4" threads="1"/> <feature name="osvw"/> <feature name="3dnowprefetch"/> <feature name="cr8legacy"/> <feature name="extapic"/> <feature name="cmp_legacy"/> <feature name="3dnow"/> <feature name="3dnowext"/> <feature name="pdpe1gb"/> <feature name="fxsr_opt"/> <feature name="mmxext"/> <feature name="ht"/> <feature name="vme"/> </cpu> # virsh capabilities | xmllint --xpath /capabilities/host/topology - <topology> <cells num="2"> <cell id="0"> <cpus num="4"> <cpu id="0"/> <cpu id="1"/> <cpu id="2"/> <cpu id="3"/> </cpus> </cell> <cell id="1"> <cpus num="4"> <cpu id="4"/> <cpu id="5"/> <cpu id="6"/> <cpu id="7"/> </cpus> </cell> </cells> </topology>
Note, it is reporting sockets=1, because sockets is the number of sockets *per* NUMA node.
Now I try to figure the guest to match the host using:
<cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell cpus='0-3' memory='512000'/> <cell cpus='4-7' memory='512000'/> </numa> </cpu>
And I get:
error: Maximum CPUs greater than topology limit
So, the XML checker is mistaking 'sockets' as the total number of sockets, rather than the per-node socket count. We need to fix this bogus check
Has this been fixed in either 0.9.12 or master? Last I checked in 0.9.10 it was not and there has now been a lot of changes regarding NUMA (mostly numad related but still) Thanks. -- Doug Goldstein
participants (3)
-
Daniel P. Berrange
-
Doug Goldstein
-
Jiri Denemark