
On 12.02.2015 20:25, Eduardo Habkost wrote:
On Wed, Feb 11, 2015 at 05:09:01PM +0100, Michal Privoznik wrote:
On 11.02.2015 16:47, Daniel P. Berrange wrote:
On Wed, Feb 11, 2015 at 04:31:53PM +0100, Michal Privoznik wrote:
There are two reasons why we query & check the supported capabilities from QEMU
1. There are multiple possible CLI args for the same feature and we need to choose the "best" one to use
2. The feature is not supported and we want to give the caller a better error message than they'd get from QEMU
I'm unclear from the bug which scenario applies here.
If it is scenario 2 though, I'd just mark it as CANTFIX or WONTFIX, as no matter what we do the user would get an error. It is not worth making our capability matrix a factor of 10+ bigger just to get a better error message.
If it is scenario 1, I think the burden is on QEMU to solve. The memory-backend-{file,ram} CLI flags shouldn't be tied to guest machine types, as they are backend config setup options that should not impact guest ABI.
It's somewhere in between 1 and 2. Back in RHEL-7.0 days libvirt would have created a guest with:
-numa node,...,mem=1337
But if qemu reports it support memory-backend-ram, libvirt tries to use it:
-object memory-backend-ram,id=ram-node0,size=1337M,... \ -numa node,...,memdev=ram-node0
This breaks migration to newer qemu which is in RHEL-7.1. If qemu would report the correct value, we can generate the correct command line and migration succeeds. However, our fault is, we are not asking the correct question anyway.
I understand that RHEL-7.1 QEMU is not providing enough data for libvirt to detect this before it is too late. What I am missing here is: why wasn't commit f309db1f4d51009bad0d32e12efc75530b66836b enough to fix this specific case?
The numa pinning can be expressed in libvirt in this way: <numatune> <memory mode='strict' nodeset='0-7'/> <memnode cellid='0' mode='preferred' nodeset='3'/> <memnode cellid='2' mode='strict' nodeset='1-2,5,7'/> </numatune> This tells, to pin guest #0 onto host #3, guest #2 onto host #1-2,5, or 7. For the rest of guest numa nodes, they are placed onto host #0-7. As long as there explicit guest guest numa node pinning onto host nodes (the <memnode/> element), memory-object-ram is required. However, if <numatune/> has only one child <memory/> we still can guarantee the requested configuration in CGroups and don't necessarily need memory-object-ram. My patch, you've referred to, was incomplete in this case. Moreover, it was buggy, it allowed combining use of bare -numa and memory-object-ram at the same time (which is not allowed). Michal