Re: [libvirt] QEMU capabilities vs machine types

13 Feb 2015

      On 12.02.2015 20:25, Eduardo Habkost wrote:
...
On Wed, Feb 11, 2015 at 05:09:01PM +0100, Michal Privoznik wrote:
...
On 11.02.2015 16:47, Daniel P. Berrange wrote:
...
On Wed, Feb 11, 2015 at 04:31:53PM +0100, Michal Privoznik wrote:
...
There are two reasons why we query & check the supported capabilities
from QEMU
1. There are multiple possible CLI args for the same feature and
    we need to choose the "best" one to use
2. The feature is not supported and we want to give the caller a
    better error message than they'd get from QEMU
I'm unclear from the bug which scenario applies here.
If it is scenario 2 though, I'd just mark it as CANTFIX or WONTFIX,
as no matter what we do the user would get an error. It is not worth
making our capability matrix a factor of 10+ bigger just to get a
better error message.
If it is scenario 1, I think the burden is on QEMU to solve. The
memory-backend-{file,ram} CLI flags shouldn't be tied to guest
machine types, as they are backend config setup options that should
not impact guest ABI.
It's somewhere in between 1 and 2. Back in RHEL-7.0 days libvirt would
have created a guest with:
-numa node,...,mem=1337
But if qemu reports it support memory-backend-ram, libvirt tries to use it:
-object memory-backend-ram,id=ram-node0,size=1337M,... \
-numa node,...,memdev=ram-node0
This breaks migration to newer qemu which is in RHEL-7.1. If qemu would
report the correct value, we can generate the correct command line and
migration succeeds. However, our fault is, we are not asking the correct
question anyway.
I understand that RHEL-7.1 QEMU is not providing enough data for libvirt
to detect this before it is too late. What I am missing here is: why
wasn't commit f309db1f4d51009bad0d32e12efc75530b66836b enough to fix
this specific case?
The numa pinning can be expressed in libvirt in this way:

<numatune>
  <memory mode='strict' nodeset='0-7'/>
  <memnode cellid='0' mode='preferred' nodeset='3'/>
  <memnode cellid='2' mode='strict' nodeset='1-2,5,7'/>
</numatune>

This tells, to pin guest #0 onto host #3, guest #2 onto host #1-2,5, or
7. For the rest of guest numa nodes, they are placed onto host #0-7.
As long as there explicit guest guest numa node pinning onto host nodes
(the <memnode/> element), memory-object-ram is required. However, if
<numatune/> has only one child <memory/> we still can guarantee the
requested configuration in CGroups and don't necessarily need
memory-object-ram.
My patch, you've referred to, was incomplete in this case. Moreover, it
was buggy, it allowed combining use of bare -numa and memory-object-ram
at the same time (which is not allowed).

Michal