On 12.02.2015 20:25, Eduardo Habkost wrote:
On Wed, Feb 11, 2015 at 05:09:01PM +0100, Michal Privoznik wrote:
> On 11.02.2015 16:47, Daniel P. Berrange wrote:
>> On Wed, Feb 11, 2015 at 04:31:53PM +0100, Michal Privoznik wrote:
>>>
>>
>> There are two reasons why we query & check the supported capabilities
>> from QEMU
>>
>> 1. There are multiple possible CLI args for the same feature and
>> we need to choose the "best" one to use
>>
>> 2. The feature is not supported and we want to give the caller a
>> better error message than they'd get from QEMU
>>
>> I'm unclear from the bug which scenario applies here.
>>
>> If it is scenario 2 though, I'd just mark it as CANTFIX or WONTFIX,
>> as no matter what we do the user would get an error. It is not worth
>> making our capability matrix a factor of 10+ bigger just to get a
>> better error message.
>>
>> If it is scenario 1, I think the burden is on QEMU to solve. The
>> memory-backend-{file,ram} CLI flags shouldn't be tied to guest
>> machine types, as they are backend config setup options that should
>> not impact guest ABI.
>
> It's somewhere in between 1 and 2. Back in RHEL-7.0 days libvirt would
> have created a guest with:
>
> -numa node,...,mem=1337
>
> But if qemu reports it support memory-backend-ram, libvirt tries to use it:
>
> -object memory-backend-ram,id=ram-node0,size=1337M,... \
> -numa node,...,memdev=ram-node0
>
> This breaks migration to newer qemu which is in RHEL-7.1. If qemu would
> report the correct value, we can generate the correct command line and
> migration succeeds. However, our fault is, we are not asking the correct
> question anyway.
I understand that RHEL-7.1 QEMU is not providing enough data for libvirt
to detect this before it is too late. What I am missing here is: why
wasn't commit f309db1f4d51009bad0d32e12efc75530b66836b enough to fix
this specific case?
The numa pinning can be expressed in libvirt in this way:
<numatune>
<memory mode='strict' nodeset='0-7'/>
<memnode cellid='0' mode='preferred' nodeset='3'/>
<memnode cellid='2' mode='strict' nodeset='1-2,5,7'/>
</numatune>
This tells, to pin guest #0 onto host #3, guest #2 onto host #1-2,5, or
7. For the rest of guest numa nodes, they are placed onto host #0-7.
As long as there explicit guest guest numa node pinning onto host nodes
(the <memnode/> element), memory-object-ram is required. However, if
<numatune/> has only one child <memory/> we still can guarantee the
requested configuration in CGroups and don't necessarily need
memory-object-ram.
My patch, you've referred to, was incomplete in this case. Moreover, it
was buggy, it allowed combining use of bare -numa and memory-object-ram
at the same time (which is not allowed).
Michal