On Thu, 20 Jul 2017 11:10:31 +0100
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Thu, Jul 20, 2017 at 11:29:26AM +0200, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have(a)oracle.com>
>
> The QEMU driver can erroneously allocate more vpus to a domain
> than there are cpus in the domain if the <numa> element is used
> to describe <cpu> element topology. Fix this by calculating
> the number of cpus described in the <numa> element of a <cpu>
> element and comparing it to the number of vcpus. If the number
> of vcpus is greater than the number of cpus, cap the number of
> vcpus to the number of cpus.
>
> This patch also adds a small libvirt documentation update under
> the "CPU Allocation" section.
>
> Signed-off-by: Wim ten Have <wim.ten.have(a)oracle.com>
> ---
> docs/formatdomain.html.in | 9 ++++++++-
> src/conf/domain_conf.c | 14 +++++++++++---
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index 07208ee..3c85307 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -501,7 +501,14 @@
> <dt><code>vcpu</code></dt>
> <dd>The content of this element defines the maximum number of virtual
> CPUs allocated for the guest OS, which must be between 1 and
> - the maximum supported by the hypervisor.
> + the maximum supported by the hypervisor. If a
<code>numa</code>
> + element is contained within the <code>cpu</code> element, the
> + number of virtual CPUs to be allocated is compared with the sum
> + of the <code>cpus</code> attributes across the
<code>cell</code>
> + elements within the <code>numa</code> element. If the number
of
> + virtual CPUs is greater than the sum of the <code>cpus</code>
> + attributes, the number of virtual CPUs is capped at sum of the
> + <code>cpus</code> attributes.
> <dl>
> <dt><code>cpuset</code></dt>
> <dd>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 958a5b7..73d7f4f 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -3844,12 +3844,20 @@ virDomainDefPostParseMemory(virDomainDefPtr def,
> unsigned long long numaMemory = 0;
> unsigned long long hotplugMemory = 0;
>
> - /* Attempt to infer the initial memory size from the sum NUMA memory sizes
> - * in case ABI updates are allowed or the <memory> element wasn't
specified */
> + /* Attempt to infer the initial memory size from the sum NUMA memory
> + * sizes in case ABI updates are allowed or the <memory> element
> + * wasn't specified. Also cap the vcpu count to the sum of NUMA cpus
> + * allocated for all nodes. */
> if (def->mem.total_memory == 0 ||
> parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE ||
> - parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION)
> + parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION) {
> + unsigned int numaVcpus = 0;
> +
> + if ((numaVcpus = virDomainNumaGetCPUCountTotal(def->numa)))
> + virDomainDefSetVcpus(def, numaVcpus);
> +
> numaMemory = virDomainNumaGetMemorySize(def->numa);
> + }
AFAICT, this scenario is simply a user configuration mistake, and so we
should report an error message to them to fix it.
Not to push but I think this is the correct way ... O:-).
Ok, perhaps the documentation note/commit message should be slightly
rewritten aswell as the altered comment on specific code section.
The change is _NOT_ changing final guest provided 'maxvcpus' but
'vcpus' instead. This means that it adds the
"current='#count'" under
the vcpu element if such is less then vcpu 'maxvcpus' provided count.
If equal there is no issue and if larger there is indeed a correct
error message (exceptional condition).
Imagine simpel domain description _WITHOUT_ <numa> and below <vcpu>
element description;
<vcpu placement='static' current='4'>16</vcpu>
This nicely creates a guest with 4 domain CPUs added, where the
administrator can "hot-add" an additional 12 CPUs making the full
'maxvcpus' defined 16. "hot-add" by virsh;
virsh # setvcpus kvm26 16
Without the fix under former <numa> domain description libvirt would
bring whole '16' vcpus to the guest, and if there was a current value
given all by current on top of the <numa> <cell ... to NUMA-node0
for that matter :-( so wrong;
<vcpu placement='static'>16</vcpu>
..
<numa>
<cell id='0' cpus='0' memory='262144'
unit='KiB'/>
<cell id='1' cpus='1' memory='262144'
unit='KiB'/>
<cell id='2' cpus='2' memory='262144'
unit='KiB'/>
<cell id='3' cpus='3' memory='262144'
unit='KiB'/>
</numa>
With the fix, its post configuration action will now nicely rewrite
the <vcpu ... current='#count'> element as shown below;
<vcpu placement='static' current='4'>16</vcpu>
..
<numa>
<cell id='0' cpus='0' memory='262144'
unit='KiB'/>
<cell id='1' cpus='1' memory='262144'
unit='KiB'/>
<cell id='2' cpus='2' memory='262144'
unit='KiB'/>
<cell id='3' cpus='3' memory='262144'
unit='KiB'/>
</numa>
In case "current='#count'" description is given where it does not
match the <numa> <cell ... #cpus> count then such is corrected to
cap the sum of all <numa> <cell ... #cpus>.
Perhaps that occurance should be rejected with an error message, then
such is not exceeding the domain administrators written <vcpu> 'maxvcpus'
element being '16' within my example above. That is a hard limit and
of course rejected with an error.
I will not argue further ... O:-) I'll wait and if it is a _NO_ then
it is a _NO_!
Regards,
- Wim.