[libvirt] [PATCHv2 0/3] Use virNodeGetCPUMap where appropriate

This series concludes the introduction of the virNodeGetCPUMap API by replacing calls to virNodeGetInfo used only for the purpose of computing the maximum number of node CPUs (which has the potential to yield the incorrect number). Most prominently, with patch 3/3 the output of virsh vcpuinfo will now be correct for domains on hosts with offline CPUs V2 Changes: Rework based on Eric's feedback: - Use nodeGetCPUCount instead of nodeGetCPUMap - Avoid code bloat by computing node CPU count in a helper function Viktor Mihajlovski (3): qemu, lxc: Change host CPU detection logic. python: Use virNodeGetCPUMap where possible virsh: Use virNodeGetCPUMap if possible python/libvirt-override.c | 87 ++++++++++++++++++++++++++++------------------- src/lxc/lxc_controller.c | 8 ++--- src/qemu/qemu_driver.c | 14 +++----- src/qemu/qemu_process.c | 8 ++--- tools/virsh-domain.c | 32 ++++++++++++----- 5 files changed, 86 insertions(+), 63 deletions(-) -- 1.7.12.4

The drivers for QEMU and LXC use virNodeGetInfo to determine the number of host CPUs. This approach can lead to a wrong (too small) number if one or more CPUs are offline. It is better to use nodeGetCPUCount on Linux hosts. --- V2 Changes: Use nodeGetCPUCount as suggested by Eric, nodeGetCPUMap is too "heavyweight". src/lxc/lxc_controller.c | 8 +++----- src/qemu/qemu_driver.c | 14 +++++--------- src/qemu/qemu_process.c | 8 +++----- 3 files changed, 11 insertions(+), 19 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index a41c903..ed3d3d0 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -492,17 +492,15 @@ static int virLXCControllerSetupNUMAPolicy(virLXCControllerPtr ctrl) static int virLXCControllerSetupCpuAffinity(virLXCControllerPtr ctrl) { int hostcpus, maxcpu = CPU_SETSIZE; - virNodeInfo nodeinfo; virBitmapPtr cpumap, cpumapToSet; VIR_DEBUG("Setting CPU affinity"); - if (nodeGetInfo(NULL, &nodeinfo) < 0) - return -1; - /* setaffinity fails if you set bits for CPUs which * aren't present, so we have to limit ourselves */ - hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + if ((hostcpus = nodeGetCPUCount()) < 0) + return -1; + if (maxcpu > hostcpus) maxcpu = hostcpus; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3980c10..6ae33d3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4097,7 +4097,6 @@ qemudDomainGetVcpuPinInfo(virDomainPtr dom, struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; - virNodeInfo nodeinfo; virDomainDefPtr targetDef = NULL; int ret = -1; int maxcpu, hostcpus, vcpu, pcpu; @@ -4133,9 +4132,9 @@ qemudDomainGetVcpuPinInfo(virDomainPtr dom, /* Coverity didn't realize that targetDef must be set if we got here. */ sa_assert(targetDef); - if (nodeGetInfo(dom->conn, &nodeinfo) < 0) + if ((hostcpus = nodeGetCPUCount()) < 0) goto cleanup; - hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + maxcpu = maplen * 8; if (maxcpu > hostcpus) maxcpu = hostcpus; @@ -4348,7 +4347,6 @@ qemudDomainGetEmulatorPinInfo(virDomainPtr dom, { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; - virNodeInfo nodeinfo; virDomainDefPtr targetDef = NULL; int ret = -1; int maxcpu, hostcpus, pcpu; @@ -4381,9 +4379,9 @@ qemudDomainGetEmulatorPinInfo(virDomainPtr dom, /* Coverity didn't realize that targetDef must be set if we got here. */ sa_assert(targetDef); - if (nodeGetInfo(dom->conn, &nodeinfo) < 0) + if ((hostcpus = nodeGetCPUCount()) < 0) goto cleanup; - hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + maxcpu = maplen * 8; if (maxcpu > hostcpus) maxcpu = hostcpus; @@ -4425,7 +4423,6 @@ qemudDomainGetVcpus(virDomainPtr dom, int maplen) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm; - virNodeInfo nodeinfo; int i, v, maxcpu, hostcpus; int ret = -1; qemuDomainObjPrivatePtr priv; @@ -4451,10 +4448,9 @@ qemudDomainGetVcpus(virDomainPtr dom, priv = vm->privateData; - if (nodeGetInfo(dom->conn, &nodeinfo) < 0) + if ((hostcpus = nodeGetCPUCount()) < 0) goto cleanup; - hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); maxcpu = maplen * 8; if (maxcpu > hostcpus) maxcpu = hostcpus; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3ac5282..74e43b8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1902,15 +1902,13 @@ qemuPrepareCpumap(struct qemud_driver *driver, virBitmapPtr nodemask) { int i, hostcpus, maxcpu = QEMUD_CPUMASK_LEN; - virNodeInfo nodeinfo; virBitmapPtr cpumap = NULL; - if (nodeGetInfo(NULL, &nodeinfo) < 0) - return NULL; - /* setaffinity fails if you set bits for CPUs which * aren't present, so we have to limit ourselves */ - hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + if ((hostcpus = nodeGetCPUCount()) < 0) + return NULL; + if (maxcpu > hostcpus) maxcpu = hostcpus; -- 1.7.12.4

On 10/31/2012 11:20 AM, Viktor Mihajlovski wrote:
The drivers for QEMU and LXC use virNodeGetInfo to determine the number of host CPUs. This approach can lead to a wrong (too small) number if one or more CPUs are offline. It is better to use nodeGetCPUCount on Linux hosts. --- V2 Changes: Use nodeGetCPUCount as suggested by Eric, nodeGetCPUMap is too "heavyweight".
Hmm, while reading this patch series, I realized that my initial RPC implementation always requests the number of online processors over the wire, even though you've now proven that calling virNodeGetCPUMap(conn, NULL, NULL, 0) is useful. I'll see about getting in a followup patch to optimize the on-the-wire call to avoid wasted effort of collecting a bitmap just to determine the number of online CPUs when we really only care about the max cpu number. Meanwhile, on to this patch:
@@ -492,17 +492,15 @@ static int virLXCControllerSetupNUMAPolicy(virLXCControllerPtr ctrl) static int virLXCControllerSetupCpuAffinity(virLXCControllerPtr ctrl) { int hostcpus, maxcpu = CPU_SETSIZE; - virNodeInfo nodeinfo; virBitmapPtr cpumap, cpumapToSet;
VIR_DEBUG("Setting CPU affinity");
- if (nodeGetInfo(NULL, &nodeinfo) < 0) - return -1; - /* setaffinity fails if you set bits for CPUs which * aren't present, so we have to limit ourselves */
Question - does setaffinity fail if you request setting a CPU that is present but offline? In that case, we would need the CPU map instead of just the count of max cpus. But in the meantime, I'm okay with your patch. ACK, and will push shortly. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/01/2012 04:00 AM, Eric Blake wrote:
On 10/31/2012 11:20 AM, Viktor Mihajlovski wrote:
/* setaffinity fails if you set bits for CPUs which * aren't present, so we have to limit ourselves */
Question - does setaffinity fail if you request setting a CPU that is present but offline? In that case, we would need the CPU map instead of just the count of max cpus. But in the meantime, I'm okay with your patch.
It will fail in the case you describe. Of course, with the new API it is possible for a client to determine the host CPU map beforehand allowing to specify a valid affinity mask. The other question is whether an incorrect mask should be corrected silently by libvirt or whether a reported error would be more appropriate. All in all this needs more thought and discussion, IMO independent of this patch series. -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

Modified the places where virNodeGetInfo was used for the purpose of obtaining the maximum node CPU number. Transparently falling back to virNodeGetInfo in case of failure. Wrote a utility function getPyNodeCPUCount for that purpose. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- V2 Changes: Implemented Eric Blake's suggestion to remove code bloat introduced by first patch version. New helper function getPyNodeCPUCount is now used to calculate the number of node CPUs. python/libvirt-override.c | 87 ++++++++++++++++++++++++++++------------------- 1 file changed, 52 insertions(+), 35 deletions(-) diff --git a/python/libvirt-override.c b/python/libvirt-override.c index cd48227..3528997 100644 --- a/python/libvirt-override.c +++ b/python/libvirt-override.c @@ -253,6 +253,39 @@ cleanup: return NULL; } +/* + * Utility function to retrieve the number of node CPUs present. + * It first tries virGetNodeCPUMap, which will return the + * number reliably, if available. + * As a fallback and for compatibility with backlevel libvirt + * versions virGetNodeInfo will be called to calculate the + * CPU number, which has the potential to return a too small + * number if some host CPUs are offline. + */ +static int +getPyNodeCPUCount(virConnectPtr conn) { + int i_retval; + virNodeInfo nodeinfo; + + LIBVIRT_BEGIN_ALLOW_THREADS; + i_retval = virNodeGetCPUMap(conn, NULL, NULL, 0); + LIBVIRT_END_ALLOW_THREADS; + + if (i_retval < 0) { + /* fallback: use nodeinfo */ + LIBVIRT_BEGIN_ALLOW_THREADS; + i_retval = virNodeGetInfo(conn, &nodeinfo); + LIBVIRT_END_ALLOW_THREADS; + if (i_retval < 0) + goto cleanup; + + i_retval = VIR_NODEINFO_MAXCPUS(nodeinfo); + } + +cleanup: + return i_retval; +} + /************************************************************************ * * * Statistics * @@ -1338,22 +1371,18 @@ libvirt_virDomainGetVcpus(PyObject *self ATTRIBUTE_UNUSED, virDomainPtr domain; PyObject *pyobj_domain, *pyretval = NULL, *pycpuinfo = NULL, *pycpumap = NULL; PyObject *error = NULL; - virNodeInfo nodeinfo; virDomainInfo dominfo; virVcpuInfoPtr cpuinfo = NULL; unsigned char *cpumap = NULL; size_t cpumaplen, i; - int i_retval; + int i_retval, cpunum; if (!PyArg_ParseTuple(args, (char *)"O:virDomainGetVcpus", &pyobj_domain)) return NULL; domain = (virDomainPtr) PyvirDomain_Get(pyobj_domain); - LIBVIRT_BEGIN_ALLOW_THREADS; - i_retval = virNodeGetInfo(virDomainGetConnect(domain), &nodeinfo); - LIBVIRT_END_ALLOW_THREADS; - if (i_retval < 0) + if ((cpunum = getPyNodeCPUCount(virDomainGetConnect(domain))) < 0) return VIR_PY_INT_FAIL; LIBVIRT_BEGIN_ALLOW_THREADS; @@ -1365,7 +1394,7 @@ libvirt_virDomainGetVcpus(PyObject *self ATTRIBUTE_UNUSED, if (VIR_ALLOC_N(cpuinfo, dominfo.nrVirtCpu) < 0) return PyErr_NoMemory(); - cpumaplen = VIR_CPU_MAPLEN(VIR_NODEINFO_MAXCPUS(nodeinfo)); + cpumaplen = VIR_CPU_MAPLEN(cpunum); if (xalloc_oversized(dominfo.nrVirtCpu, cpumaplen) || VIR_ALLOC_N(cpumap, dominfo.nrVirtCpu * cpumaplen) < 0) { error = PyErr_NoMemory(); @@ -1423,11 +1452,11 @@ libvirt_virDomainGetVcpus(PyObject *self ATTRIBUTE_UNUSED, goto cleanup; } for (i = 0 ; i < dominfo.nrVirtCpu ; i++) { - PyObject *info = PyTuple_New(VIR_NODEINFO_MAXCPUS(nodeinfo)); + PyObject *info = PyTuple_New(cpunum); int j; if (info == NULL) goto cleanup; - for (j = 0 ; j < VIR_NODEINFO_MAXCPUS(nodeinfo) ; j++) { + for (j = 0 ; j < cpunum ; j++) { PyObject *item = NULL; if ((item = PyBool_FromLong(VIR_CPU_USABLE(cpumap, cpumaplen, i, j))) == NULL || PyTuple_SetItem(info, j, item) < 0) { @@ -1467,9 +1496,8 @@ libvirt_virDomainPinVcpu(PyObject *self ATTRIBUTE_UNUSED, virDomainPtr domain; PyObject *pyobj_domain, *pycpumap; PyObject *ret = NULL; - virNodeInfo nodeinfo; unsigned char *cpumap; - int cpumaplen, i, vcpu, tuple_size; + int cpumaplen, i, vcpu, tuple_size, cpunum; int i_retval; if (!PyArg_ParseTuple(args, (char *)"OiO:virDomainPinVcpu", @@ -1477,10 +1505,7 @@ libvirt_virDomainPinVcpu(PyObject *self ATTRIBUTE_UNUSED, return NULL; domain = (virDomainPtr) PyvirDomain_Get(pyobj_domain); - LIBVIRT_BEGIN_ALLOW_THREADS; - i_retval = virNodeGetInfo(virDomainGetConnect(domain), &nodeinfo); - LIBVIRT_END_ALLOW_THREADS; - if (i_retval < 0) + if ((cpunum = getPyNodeCPUCount(virDomainGetConnect(domain))) < 0) return VIR_PY_INT_FAIL; if (PyTuple_Check(pycpumap)) { @@ -1492,7 +1517,7 @@ libvirt_virDomainPinVcpu(PyObject *self ATTRIBUTE_UNUSED, return ret; } - cpumaplen = VIR_CPU_MAPLEN(VIR_NODEINFO_MAXCPUS(nodeinfo)); + cpumaplen = VIR_CPU_MAPLEN(cpunum); if (VIR_ALLOC_N(cpumap, cpumaplen) < 0) return PyErr_NoMemory(); @@ -1509,7 +1534,7 @@ libvirt_virDomainPinVcpu(PyObject *self ATTRIBUTE_UNUSED, VIR_UNUSE_CPU(cpumap, i); } - for (; i < VIR_NODEINFO_MAXCPUS(nodeinfo); i++) + for (; i < cpunum; i++) VIR_UNUSE_CPU(cpumap, i); LIBVIRT_BEGIN_ALLOW_THREADS; @@ -1534,9 +1559,8 @@ libvirt_virDomainPinVcpuFlags(PyObject *self ATTRIBUTE_UNUSED, virDomainPtr domain; PyObject *pyobj_domain, *pycpumap; PyObject *ret = NULL; - virNodeInfo nodeinfo; unsigned char *cpumap; - int cpumaplen, i, vcpu, tuple_size; + int cpumaplen, i, vcpu, tuple_size, cpunum; unsigned int flags; int i_retval; @@ -1545,10 +1569,7 @@ libvirt_virDomainPinVcpuFlags(PyObject *self ATTRIBUTE_UNUSED, return NULL; domain = (virDomainPtr) PyvirDomain_Get(pyobj_domain); - LIBVIRT_BEGIN_ALLOW_THREADS; - i_retval = virNodeGetInfo(virDomainGetConnect(domain), &nodeinfo); - LIBVIRT_END_ALLOW_THREADS; - if (i_retval < 0) + if ((cpunum = getPyNodeCPUCount(virDomainGetConnect(domain))) < 0) return VIR_PY_INT_FAIL; if (PyTuple_Check(pycpumap)) { @@ -1560,7 +1581,7 @@ libvirt_virDomainPinVcpuFlags(PyObject *self ATTRIBUTE_UNUSED, return ret; } - cpumaplen = VIR_CPU_MAPLEN(VIR_NODEINFO_MAXCPUS(nodeinfo)); + cpumaplen = VIR_CPU_MAPLEN(cpunum); if (VIR_ALLOC_N(cpumap, cpumaplen) < 0) return PyErr_NoMemory(); @@ -1577,7 +1598,7 @@ libvirt_virDomainPinVcpuFlags(PyObject *self ATTRIBUTE_UNUSED, VIR_UNUSE_CPU(cpumap, i); } - for (; i < VIR_NODEINFO_MAXCPUS(nodeinfo); i++) + for (; i < cpunum; i++) VIR_UNUSE_CPU(cpumap, i); LIBVIRT_BEGIN_ALLOW_THREADS; @@ -1599,23 +1620,19 @@ libvirt_virDomainGetVcpuPinInfo(PyObject *self ATTRIBUTE_UNUSED, PyObject *args) { virDomainPtr domain; PyObject *pyobj_domain, *pycpumaps = NULL; - virNodeInfo nodeinfo; virDomainInfo dominfo; unsigned char *cpumaps; size_t cpumaplen, vcpu, pcpu; unsigned int flags; - int i_retval; + int i_retval, cpunum; if (!PyArg_ParseTuple(args, (char *)"Oi:virDomainGetVcpuPinInfo", &pyobj_domain, &flags)) return NULL; domain = (virDomainPtr) PyvirDomain_Get(pyobj_domain); - LIBVIRT_BEGIN_ALLOW_THREADS; - i_retval = virNodeGetInfo(virDomainGetConnect(domain), &nodeinfo); - LIBVIRT_END_ALLOW_THREADS; - if (i_retval < 0) - return VIR_PY_NONE; + if ((cpunum = getPyNodeCPUCount(virDomainGetConnect(domain))) < 0) + return VIR_PY_INT_FAIL; LIBVIRT_BEGIN_ALLOW_THREADS; i_retval = virDomainGetInfo(domain, &dominfo); @@ -1623,7 +1640,7 @@ libvirt_virDomainGetVcpuPinInfo(PyObject *self ATTRIBUTE_UNUSED, if (i_retval < 0) return VIR_PY_NONE; - cpumaplen = VIR_CPU_MAPLEN(VIR_NODEINFO_MAXCPUS(nodeinfo)); + cpumaplen = VIR_CPU_MAPLEN(cpunum); if (xalloc_oversized(dominfo.nrVirtCpu, cpumaplen) || VIR_ALLOC_N(cpumaps, dominfo.nrVirtCpu * cpumaplen) < 0) goto cleanup; @@ -1639,11 +1656,11 @@ libvirt_virDomainGetVcpuPinInfo(PyObject *self ATTRIBUTE_UNUSED, goto cleanup; for (vcpu = 0; vcpu < dominfo.nrVirtCpu; vcpu++) { - PyObject *mapinfo = PyTuple_New(VIR_NODEINFO_MAXCPUS(nodeinfo)); + PyObject *mapinfo = PyTuple_New(cpunum); if (mapinfo == NULL) goto cleanup; - for (pcpu = 0; pcpu < VIR_NODEINFO_MAXCPUS(nodeinfo); pcpu++) { + for (pcpu = 0; pcpu < cpunum; pcpu++) { PyTuple_SetItem(mapinfo, pcpu, PyBool_FromLong(VIR_CPU_USABLE(cpumaps, cpumaplen, vcpu, pcpu))); } -- 1.7.12.4

Modified the places where virNodeGetInfo was used for the purpose of obtaining the maximum node CPU number. Transparently falling back to virNodeGetInfo in case of failure. Wrote utility function vshNodeGetCPUCount to compute node CPU number. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- V2 Changes: Implemented Eric Blake's suggestion to remove code bloat introduced by first patch version. New helper function vshNodeGetCPUCount is now used to calculate the number of node CPUs. tools/virsh-domain.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 255669f..59289f1 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -126,6 +126,26 @@ vshDomainVcpuStateToString(int state) } /* + * Determine number of CPU nodes present by trying + * virNodeGetCPUMap and falling back to virNodeGetInfo + * if needed. + */ +static int +vshNodeGetCPUCount(virConnectPtr conn) +{ + int ret; + virNodeInfo nodeinfo; + + if ((ret = virNodeGetCPUMap(conn, NULL, NULL, 0)) < 0) { + /* fall back to nodeinfo */ + if (virNodeGetInfo(conn, &nodeinfo) == 0) { + ret = VIR_NODEINFO_MAXCPUS(nodeinfo); + } + } + return ret; +} + +/* * "attach-device" command */ static const vshCmdInfo info_attach_device[] = { @@ -4497,7 +4517,6 @@ cmdVcpuinfo(vshControl *ctl, const vshCmd *cmd) { virDomainInfo info; virDomainPtr dom; - virNodeInfo nodeinfo; virVcpuInfoPtr cpuinfo; unsigned char *cpumaps; int ncpus, maxcpu; @@ -4508,7 +4527,7 @@ cmdVcpuinfo(vshControl *ctl, const vshCmd *cmd) if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) return false; - if (virNodeGetInfo(ctl->conn, &nodeinfo) != 0) { + if ((maxcpu = vshNodeGetCPUCount(ctl->conn)) < 0) { virDomainFree(dom); return false; } @@ -4519,7 +4538,6 @@ cmdVcpuinfo(vshControl *ctl, const vshCmd *cmd) } cpuinfo = vshMalloc(ctl, sizeof(virVcpuInfo)*info.nrVirtCpu); - maxcpu = VIR_NODEINFO_MAXCPUS(nodeinfo); cpumaplen = VIR_CPU_MAPLEN(maxcpu); cpumaps = vshMalloc(ctl, info.nrVirtCpu * cpumaplen); @@ -4645,7 +4663,6 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) { virDomainInfo info; virDomainPtr dom; - virNodeInfo nodeinfo; int vcpu = -1; const char *cpulist = NULL; bool ret = true; @@ -4695,7 +4712,7 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) return false; } - if (virNodeGetInfo(ctl->conn, &nodeinfo) != 0) { + if ((maxcpu = vshNodeGetCPUCount(ctl->conn)) < 0) { virDomainFree(dom); return false; } @@ -4712,7 +4729,6 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) return false; } - maxcpu = VIR_NODEINFO_MAXCPUS(nodeinfo); cpumaplen = VIR_CPU_MAPLEN(maxcpu); /* Query mode: show CPU affinity information then exit.*/ @@ -4864,7 +4880,6 @@ static bool cmdEmulatorPin(vshControl *ctl, const vshCmd *cmd) { virDomainPtr dom; - virNodeInfo nodeinfo; const char *cpulist = NULL; bool ret = true; unsigned char *cpumap = NULL; @@ -4905,12 +4920,11 @@ cmdEmulatorPin(vshControl *ctl, const vshCmd *cmd) } query = !cpulist; - if (virNodeGetInfo(ctl->conn, &nodeinfo) != 0) { + if ((maxcpu = vshNodeGetCPUCount(ctl->conn)) < 0) { virDomainFree(dom); return false; } - maxcpu = VIR_NODEINFO_MAXCPUS(nodeinfo); cpumaplen = VIR_CPU_MAPLEN(maxcpu); /* Query mode: show CPU affinity information then exit.*/ -- 1.7.12.4

On Wed, Oct 31, 2012 at 06:20:58PM +0100, Viktor Mihajlovski wrote:
Modified the places where virNodeGetInfo was used for the purpose of obtaining the maximum node CPU number. Transparently falling back to virNodeGetInfo in case of failure. Wrote utility function vshNodeGetCPUCount to compute node CPU number.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- V2 Changes: Implemented Eric Blake's suggestion to remove code bloat introduced by first patch version. New helper function vshNodeGetCPUCount is now used to calculate the number of node CPUs.
tools/virsh-domain.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 255669f..59289f1 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -126,6 +126,26 @@ vshDomainVcpuStateToString(int state) }
/* + * Determine number of CPU nodes present by trying + * virNodeGetCPUMap and falling back to virNodeGetInfo + * if needed. + */ +static int +vshNodeGetCPUCount(virConnectPtr conn) +{ + int ret; + virNodeInfo nodeinfo; + + if ((ret = virNodeGetCPUMap(conn, NULL, NULL, 0)) < 0) { + /* fall back to nodeinfo */ + if (virNodeGetInfo(conn, &nodeinfo) == 0) { + ret = VIR_NODEINFO_MAXCPUS(nodeinfo); + }
Isn't VIR_NODEINFO_MAXCPUS buggy? Either don't fall back to nodeinfo or fix it. -- Thanks, Hu Tao

On 10/31/2012 08:58 PM, Hu Tao wrote:
On Wed, Oct 31, 2012 at 06:20:58PM +0100, Viktor Mihajlovski wrote:
Modified the places where virNodeGetInfo was used for the purpose of obtaining the maximum node CPU number. Transparently falling back to virNodeGetInfo in case of failure. Wrote utility function vshNodeGetCPUCount to compute node CPU number.
+static int +vshNodeGetCPUCount(virConnectPtr conn) +{ + int ret; + virNodeInfo nodeinfo; + + if ((ret = virNodeGetCPUMap(conn, NULL, NULL, 0)) < 0) { + /* fall back to nodeinfo */ + if (virNodeGetInfo(conn, &nodeinfo) == 0) { + ret = VIR_NODEINFO_MAXCPUS(nodeinfo); + }
Isn't VIR_NODEINFO_MAXCPUS buggy? Either don't fall back to nodeinfo or fix it.
Hmm, and I just realized another issue - on the RHEL 5 box I tested, there is no /sys/devices/system/cpu/possible, so virNodeGetCPUCount() currently fails, even though virNodeGetInfo() succeeded. I'm going to hold off pushing this series until after 1.0.0, to avoid any chance of a last-minute unintentional regression. You were asking about VIR_NODEINFO_MAXCPUS: #define VIR_NODEINFO_MAXCPUS(nodeinfo) ((nodeinfo).nodes*(nodeinfo).sockets*(nodeinfo).cores*(nodeinfo).threads) I can confirm that virNodeGetInfo misbehaves if enough trailing cpus are offline, and therefore agree that we need to fix that API: # virsh nodeinfo CPU model: x86_64 CPU(s): 2 CPU frequency: 2801 MHz CPU socket(s): 1 Core(s) per socket: 2 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 3941792 KiB # echo 0 > /sys/devices/system/cpu/cpu1/online # virsh nodeinfo CPU model: x86_64 CPU(s): 1 CPU frequency: 2801 MHz CPU socket(s): 1 Core(s) per socket: 1 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 3941792 KiB That just changed things from 2 to 1. No fun. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/01/2012 04:47 AM, Eric Blake wrote:
On 10/31/2012 08:58 PM, Hu Tao wrote:
Isn't VIR_NODEINFO_MAXCPUS buggy? Either don't fall back to nodeinfo or fix it. We will always have to fall back if we talk to a backlevel libvirt to retain compatibility, see below.
Hmm, and I just realized another issue - on the RHEL 5 box I tested, there is no /sys/devices/system/cpu/possible, so virNodeGetCPUCount() currently fails, even though virNodeGetInfo() succeeded. I'm going to hold off pushing this series until after 1.0.0, to avoid any chance of a last-minute unintentional regression. Too bad I didn't realize the before, this requies another fallback to nodeGetInfo (at least there are no offline CPUs on 5.x). What do you think of pushing 2/3 and 3/3 which are not prone to regression and let me rework 1/1 including the fallback and then decide whether to pick it up for 1.0.0 or after?
You were asking about VIR_NODEINFO_MAXCPUS:
#define VIR_NODEINFO_MAXCPUS(nodeinfo) ((nodeinfo).nodes*(nodeinfo).sockets*(nodeinfo).cores*(nodeinfo).threads)
I can confirm that virNodeGetInfo misbehaves if enough trailing cpus are offline, and therefore agree that we need to fix that API:
Actually, this is where I initially started off, see the thread at https://www.redhat.com/archives/libvir-list/2012-October/msg00399.html and specifically Daniel's comments on compatibility. Obviously my first attempt was flawed/naive since it was breaking binary compatibility. One possible attempt to fix virGetNodeInfo would have been to change the meaning of the virNodeInfo.cpus field. But that would be semantically incorrect as the field is denominated as the number of active CPUs. Fixing the core/socket/thread detection doesn't seem possible using the sysfs interfaces. Now, the next straight-forward thing might have been a virNodeGetInfo2 or similar but I thought an API for the host CPU map might be more versatile in the long run. All that said ... it seems that we have to live with the flawed semantics of VIR_NODEINFO_MAXCPUS. This series should alleviate this problem while still retaining the old semantics (by means of fallback) even if a new client talks to an old server. -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On 11/01/2012 02:05 AM, Viktor Mihajlovski wrote:
You were asking about VIR_NODEINFO_MAXCPUS:
#define VIR_NODEINFO_MAXCPUS(nodeinfo) ((nodeinfo).nodes*(nodeinfo).sockets*(nodeinfo).cores*(nodeinfo).threads)
I can confirm that virNodeGetInfo misbehaves if enough trailing cpus are offline, and therefore agree that we need to fix that API:
Actually, this is where I initially started off, see the thread at https://www.redhat.com/archives/libvir-list/2012-October/msg00399.html and specifically Daniel's comments on compatibility.
Obviously my first attempt was flawed/naive since it was breaking binary compatibility. One possible attempt to fix virGetNodeInfo would have been to change the meaning of the virNodeInfo.cpus field.
No, we can't change the meaning of the cpus field - it must remain the number of active CPUs.
But that would be semantically incorrect as the field is denominated as the number of active CPUs. Fixing the core/socket/thread detection doesn't seem possible using the sysfs interfaces.
Why not? We just proved with nodeGetCPUCount that it is possible to determine the number of possible cpus even when some of the cores/threads are offline. That just means our core/socket/thread detection code needs to be aware of offline cpus, even if it can't determine their complete topology, so that it at least doesn't underestimate the number of possible cores.
Now, the next straight-forward thing might have been a virNodeGetInfo2 or similar but I thought an API for the host CPU map might be more versatile in the long run.
Both APIs are useful. We're a bit constrained with virNodeGetInfo, but I still think that it should return the right value for VIR_NODEINFO_MAXCPUS.
All that said ... it seems that we have to live with the flawed semantics of VIR_NODEINFO_MAXCPUS.
I'm not convinced of that.
This series should alleviate this problem while still retaining the old semantics (by means of fallback) even if a new client talks to an old server.
Implementation wise, from a newer client, we KNOW that if the new interface is present, then we know the cpu count is accurate (well, that won't be true until we fix the new interface to use fallbacks on RHEL 5, so getting that to work on RHEL 5 is my priority today before 1.0.0 is cut). We also know that if the new interface is not present, then virNodeGetInfo might under-estimate the number of cpus, but there is nothing we can do about it, because it is an older libvirt; but at the same time, we know it is the best we can do, so we must use the fallback. However, for older clients connecting to newer libvirt, where the older client only knows how to use the older interface, we might as well make newer libvirt give them correct information. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/01/2012 02:06 PM, Eric Blake wrote:
On 11/01/2012 02:05 AM, Viktor Mihajlovski wrote:
But that would be semantically incorrect as the field is denominated as the number of active CPUs. Fixing the core/socket/thread detection doesn't seem possible using the sysfs interfaces.
Why not? We just proved with nodeGetCPUCount that it is possible to determine the number of possible cpus even when some of the cores/threads are offline. That just means our core/socket/thread detection code needs to be aware of offline cpus, even if it can't determine their complete topology, so that it at least doesn't underestimate the number of possible cores.
Well, with offline CPUs we would be forced to guess the topology in a worst case manner (i.e. does the offline CPU add a new socket, core or thread), which will result in a too high VIR_NODEINFO_MAXCPUS value. I haven't thought through the consequences of that, though... -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
participants (3)
-
Eric Blake
-
Hu Tao
-
Viktor Mihajlovski